## Virtual Memory

#### **Definitions**

- Cache
  - Copy of data that is faster to access than the original
  - Hit: if cache has copy
  - Miss: if cache does not have copy
- Cache block
  - Unit of cache storage (multiple memory locations)
- Temporal locality
  - Programs tend to reference the same memory locations multiple times
  - Example: instructions in a loop
- Spatial locality
  - Programs tend to reference nearby locations
  - Example: data in a loop

## Cache Concept (Read)



## Cache Concept (Write)



Write back: changes stored in cache until cache block is replaced

## Memory Hierarchy

| Cache                            | Hit Cost    | Size   |
|----------------------------------|-------------|--------|
| 1st level cache/first level TLB  | 1 ns        | 64 KB  |
| 2nd level cache/second level TLB | 4 ns        | 256 KB |
| 3rd level cache                  | 12 ns       | 2 MB   |
| Memory (DRAM)                    | 100 ns      | 10 GB  |
| Data center memory (DRAM)        | 100 $\mu$ s | 100 TB |
| Local non-volatile memory        | 100 $\mu$ s | 100 GB |
| Local disk                       | 10 ms       | 1 TB   |
| Data center disk                 | 10 ms       | 100 PB |
| Remote data center disk          | 200 ms      | 1 XB   |

i7 has 8MB as shared 3<sup>rd</sup> level cache; 2<sup>nd</sup> level cache is per-core

### Cache hit rate



## Example of cache hit rate over time



#### Main Points

- Can we provide the illusion of near infinite memory in limited physical memory?
  - Demand-paged virtual memory
  - Memory-mapped files
- How do we choose which page to replace?
  - FIFO, MIN, LRU, LFU, Clock
- What types of workloads does caching work for, and how well?
  - Spatial/temporal locality vs. Zipf workloads

# Hardware address translation is a power tool

- Kernel trap on read/write to selected addresses
  - Copy on write
  - Fill on reference

- Demand paged virtual memory
- Memory mapped files
- Modified bit emulation
- Use bit emulation

## Demand Paging (Before)



## Demand Paging (After)



## **Demand Paging**

- 1. TLB miss
- 2. Page table walk
- Page fault (page invalid in page table)
- 4. Trap to kernel
- Convert virtual address to file + offset
- 6. Allocate page frame
  - Evict page if needed
- 7. Initiate disk block read into page frame

- Disk interrupt when DMA complete
- 9. Mark page as valid
- 10. Resume process at faulting instruction
- 11. TLB miss
- 12. Page table walk to fetch translation
- 13. Execute instruction

### Allocating a Page Frame

- Select old page to evict
- Find all page table entries that refer to old page
  - If page frame is shared
- Set each page table entry to invalid
- Remove any TLB entries
  - Copies of now invalid page table entry
- Write changes on page back to disk, if necessary

## How do we know if page has been modified?

- Every page table entry has some bookkeeping
  - Has page been modified?
    - Set by hardware on store instruction
    - In both TLB and page table entry
  - Has page been recently used?
    - Set by hardware on in page table entry on every TLB miss
- Bookkeeping bits can be reset by the OS kernel
  - When changes to page are flushed to disk
  - To track whether page is recently used

## Keeping Track of Page Modifications (Before)



# Keeping Track of Page Modifications (After)



## Virtual or Physical Dirty/Use Bits

- Most machines keep dirty/use bits in the page table entry
- Physical page is
  - Modified if any page table entry that points to it is modified
  - Recently used if any page table entry that points to it is recently used
- On MIPS, simpler to keep dirty/use bits in the core map
  - Core map: map of physical page frames

## Models for Application File I/O

- Explicit read/write system calls
  - Data copied to user process using system call
  - Application operates on data
  - Data copied back to kernel using system call
- Memory-mapped files
  - Open file as a memory segment
  - Program uses load/store instructions on segment memory, implicitly operating on the file
  - Page fault if portion of file is not yet in memory
  - Kernel brings missing blocks into memory, restarts process

### Advantages to Memory-mapped Files

- Programming simplicity, esp for large files
  - Operate directly on file, instead of copy in/copy out
- Zero-copy I/O
  - Data brought from disk directly into page frame
- Pipelining
  - Process can start working before all the pages are populated
- Interprocess communication
  - Shared memory segment vs. temporary file

## Cache Replacement Policy

- On a cache miss, how do we choose which entry to replace?
  - Assuming the new entry is more likely to be used in the near future
  - In direct mapped caches, not an issue!

- Policy goal: reduce cache misses
  - Improve expected case performance
  - Also: reduce likelihood of very poor performance

## A Simple Policy

- Random?
  - Replace a random entry

- FIFO?
  - Replace the entry that has been in the cache the longest time
  - What could go wrong?

#### FIFO in Action

| Reference | Α | В | С | D | Е | Α | В | С | D | Е | Α | В | С | D | Е |
|-----------|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1         | Α |   |   |   | Е |   |   |   | D |   |   |   | С |   |   |
| 2         |   | В |   |   |   | Α |   |   |   | Е |   |   |   | D |   |
| 3         |   |   | С |   |   |   | В |   |   |   | Α |   |   |   | Е |
| 4         |   |   |   | D |   |   |   | С |   |   |   | В |   |   |   |

Worst case for FIFO is if program strides through memory that is larger than the cache

### MIN, LRU, LFU

#### MIN

- Replace the cache entry that will not be used for the longest time into the future
- Optimality proof based on exchange: if evict an entry used sooner, that will trigger an earlier cache miss
- Least Recently Used (LRU)
  - Replace the cache entry that has not been used for the longest time in the past
  - Approximation of MIN
- Least Frequently Used (LFU)
  - Replace the cache entry used the least often (in the recent past)

## LRU/MIN for Sequential Scan

|           | LRU |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|-----------|-----|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Reference | Α   | В | С | D | Е | Α | В | С | D | Е | Α | В | С | D | Е |
| 1         | Α   |   |   |   | Е |   |   |   | D |   |   |   | С |   |   |
| 2         |     | В |   |   |   | Α |   |   |   | Е |   |   |   | D |   |
| 3         |     |   | С |   |   |   | В |   |   |   | Α |   |   |   | Е |
| 4         |     |   |   | D |   |   |   | С |   |   |   | В |   |   |   |
| MIN       |     |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
| 1         | Α   |   |   |   |   | + |   |   |   |   | + |   |   | + |   |
| 2         |     | В |   |   |   |   | + |   |   |   |   | + | С |   |   |
| 3         |     |   | С |   |   |   |   | + | D |   |   |   |   | + |   |
| 4         |     |   |   | D | Е |   |   |   |   | + |   |   |   |   | + |

| LRU       |   |   |   |   |   |   |     |   |   |   |   |   |   |   |   |
|-----------|---|---|---|---|---|---|-----|---|---|---|---|---|---|---|---|
| Reference | Α | В | Α | С | В | D | Α   | D | Е | D | Α | Е | В | Α | С |
| 1         | Α |   | + |   |   |   | +   |   |   |   | + |   |   | + |   |
| 2         |   | В |   |   | + |   |     |   |   |   |   |   | + |   |   |
| 3         |   |   |   | С |   |   |     |   | Е |   |   | + |   |   |   |
| 4         |   |   |   |   |   | D |     | + |   | + |   |   |   |   | С |
| FIFO      |   |   |   |   |   |   |     |   |   |   |   |   |   |   |   |
| 1         | Α |   | + |   |   |   | +   |   | Е |   |   |   |   |   |   |
| 2         |   | В |   |   | + |   |     |   |   |   | Α |   |   | + |   |
| 3         |   |   |   | С |   |   |     |   |   |   |   | + | В |   |   |
| 4         |   |   |   |   |   | D |     | + |   | + |   |   |   |   | С |
|           |   |   |   |   |   |   | MIN |   |   |   |   |   |   |   |   |
| 1         | Α |   | + |   |   |   | +   |   |   |   | + |   |   | + |   |
| 2         |   | В |   |   | + |   |     |   |   |   |   |   | + |   | С |
| 3         |   |   |   | С |   |   |     |   | Е |   |   | + |   |   |   |
| 4         |   |   |   |   |   | D |     | + |   | + |   |   |   |   |   |
|           |   |   |   |   |   |   |     |   |   |   |   |   |   |   |   |

## Belady's Anomaly

| FIFO (3 slots) |                |   |   |   |   |   |   |   |   |   |   |   |  |
|----------------|----------------|---|---|---|---|---|---|---|---|---|---|---|--|
| Reference      | Α              | В | С | D | Α | В | Е | Α | В | С | D | E |  |
| 1              | Α              |   |   | D |   |   | Е |   |   |   |   | + |  |
| 2              |                | В |   |   | Α |   |   | + |   | С |   |   |  |
| 3              |                |   | С |   |   | В |   |   | + |   | D |   |  |
|                | FIFO (4 slots) |   |   |   |   |   |   |   |   |   |   |   |  |
| 1              | Α              |   |   |   | + |   | Е |   |   |   | D |   |  |
| 2              |                | В |   |   |   | + |   | Α |   |   |   | E |  |
| 3              |                |   | С |   |   |   |   |   | В |   |   |   |  |
| 4              |                |   |   | D |   |   |   |   |   | С |   |   |  |

### Recap

- MIN is optimal
  - replace the page or cache entry that will be used farthest into the future
- LRU is an approximation of MIN
  - For programs that exhibit spatial and temporal locality